21 research outputs found
A Simple Algorithm for Estimating Distribution Parameters from -Dimensional Randomized Binary Responses
Randomized response is attractive for privacy preserving data collection
because the provided privacy can be quantified by means such as differential
privacy. However, recovering and analyzing statistics involving multiple
dependent randomized binary attributes can be difficult, posing a significant
barrier to use. In this work, we address this problem by identifying and
analyzing a family of response randomizers that change each binary attribute
independently with the same probability. Modes of Google's Rappor randomizer as
well as applications of two well-known classical randomized response methods,
Warner's original method and Simmons' unrelated question method, belong to this
family. We show that randomizers in this family transform multinomial
distribution parameters by an iterated Kronecker product of an invertible and
bisymmetric matrix. This allows us to present a simple and
efficient algorithm for obtaining unbiased maximum likelihood parameter
estimates for -way marginals from randomized responses and provide
theoretical bounds on the statistical efficiency achieved. We also describe the
efficiency - differential privacy tradeoff. Importantly, both randomization of
responses and the estimation algorithm are simple to implement, an aspect
critical to technologies for privacy protection and security.Comment: Accepted at Information Security - 21th International Conference, ISC
2018. Adapted to meet article length requirements. Fixed typo. Results
unchange
Approximation properties of haplotype tagging
BACKGROUND: Single nucleotide polymorphisms (SNPs) are locations at which the genomic sequences of population members differ. Since these differences are known to follow patterns, disease association studies are facilitated by identifying SNPs that allow the unique identification of such patterns. This process, known as haplotype tagging, is formulated as a combinatorial optimization problem and analyzed in terms of complexity and approximation properties. RESULTS: It is shown that the tagging problem is NP-hard but approximable within 1 + ln((n(2 )- n)/2) for n haplotypes but not approximable within (1 - ε) ln(n/2) for any ε > 0 unless NP ⊂ DTIME(n(log log n)). A simple, very easily implementable algorithm that exhibits the above upper bound on solution quality is presented. This algorithm has running time O([Image: see text] (2m - p + 1)) ≤ O(m(n(2 )- n)/2) where p ≤ min(n, m) for n haplotypes of size m. As we show that the approximation bound is asymptotically tight, the algorithm presented is optimal with respect to this asymptotic bound. CONCLUSION: The haplotype tagging problem is hard, but approachable with a fast, practical, and surprisingly simple algorithm that cannot be significantly improved upon on a single processor machine. Hence, significant improvement in computatational efforts expended can only be expected if the computational effort is distributed and done in parallel
Perceptions of molecular epidemiology studies of HIV among stakeholders
Background: Advances in viral sequence analysis make it possible to track the spread of infectious pathogens, such as HIV, within a population. When used to study HIV, these analyses (i.e., molecular epidemiology) potentially allow inference of the identity of individual research subjects. Current privacy standards are likely insufficient for this type of public health research. To address this challenge, it will be important to understand how stakeholders feel about the benefits and risks of such research. Design and Methods: To better understand perceived benefits and risks of these research methods, in-depth qualitative interviews were conducted with HIV-infected individuals, individuals at high-risk for contracting HIV, and professionals in HIV care and prevention. To gather additional perspectives, attendees to a public lecture on molecular epidemiology were asked to complete an informal questionnaire. Results: Among those interviewed and polled, there was near unanimous support for using molecular epidemiology to study HIV. Questionnaires showed strong agreement about benefits of molecular epidemiology, but diverse attitudes regarding risks. Interviewees acknowledged several risks, including privacy breaches and provocation of anti-gay sentiment. The interviews also demonstrated a possibility that misunderstandings about molecular epidemiology may affect how risks and benefits are evaluated. Conclusions: While nearly all study participants agree that the benefits of HIV molecular epidemiology outweigh the risks, concerns about privacy must be addressed to ensure continued trust in research institutions and willingness to participate in research
Differential privacy for symmetric log-concave mechanisms
Adding random noise to database query results is an important tool for
achieving privacy. A challenge is to minimize this noise while still meeting
privacy requirements. Recently, a sufficient and necessary condition for
-differential privacy for Gaussian noise was published.
This condition allows the computation of the minimum privacy-preserving scale
for this distribution. We extend this work and provide a sufficient and
necessary condition for -differential privacy for all
symmetric and log-concave noise densities. Our results allow fine-grained
tailoring of the noise distribution to the dimensionality of the query result.
We demonstrate that this can yield significantly lower mean squared errors than
those incurred by the currently used Laplace and Gaussian mechanisms for the
same and .Comment: AISTATS 2022, v2 corrects typo
A Note on the Hardness of the k-Ambiguity Problem
We address the problem of minimal information loss in order to k-ambiguate data, a problem related to disclosure control in disseminated data. We show that this problem is NP-hard by considering cell suppression as the ambiguation mechanism. On the way we prove that the minimum k-union problem (aka. minimum k-coverage, aka. maximum k-intersection), which is the problem of selecting k sets from a collection of n sets such that the cardinality of their union is the minimum, is NP-hard. Shown is also that if the cardinality of the sets in the collection is bounded by a constant, this restricted problem is in APX